Optimal error regions for quantum state estimation 
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Rather than point estimators, states of a quantum system that represent one's best guess for the 
given data, we consider optimal regions of estimators. As the natural counterpart of the popular 
maximum-likelihood point estimator, we introduce the maximum-likelihood region — the region of 
largest likelihood among all regions of the same size. Here, the size of a region is its prior probabil- 
ity. Another concept is the smallest credible region — the smallest region with pre-chosen posterior 
probability. For both optimization problems, the optimal region has constant likelihood on its 
boundary. We discuss criteria for assigning prior probabilities to regions, and illustrate the concepts 
and methods with several examples. 

PACS numbers: 03.65.Wj, 02.50.-r, 03.67.-a 
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I. INTRODUCTION 

Quantum state estimation (see, for example, Ref. [l|) 
is central to many, if not all, tasks that process quan- 
tum information. The characterization of a source of 
quantum carriers, the verification of the properties of a 
quantum channel, the monitoring of a transmission line 
used for quantum key distribution — all three require re- 
liable quantum state estimation, to name just the most 
familiar examples. 

In the typical situation that we are considering, sev- 
eral independently and identically prepared quantum- 
information carriers are measured one-by-one by an ap- 
paratus that realizes a probability-operator measurement 
(POM), suitably designed to extract the wanted informa- 
tion. The POM has a number of outcomes, with detec- 
tors that register individual information carriers (pho- 
tons in the majority of current experiments), and the 
data consist of the observed sequence of detection events 
("clicks") 0. 

The quantum state to be estimated is described by 
a statistical operator, the state, and the data can be 
used to determine an estimator for the state — another 
state that, so one hopes, approximates the actual state 
well. There are various strategies for finding such an es- 
timator. Thanks to the efficient methods that Hradil, 
Rehacck, and their collaborators developed for calculat- 
ing maximum-likelihood estimators (MLEs, reviewed in 
Ref. [|[; see also Ref. 3), MLEs have become the estima- 
tors of choice. For the given data, the MLE is the state 
for which the data are more likely than for any other 
state. 

Since the data have statistical noise, one needs to sup- 
plement a point estimator with error bars of some sort — 
error regions, more generally, for higher-dimensional 
problems. Ad-hoc recipes have been proposed for at- 
taching a vicinity of states to a given point estimator, 
often relying on approximations valid only in the limit of 
a large amount of data (see Refs. [|| and |(| for examples 
in quantum state estimation), or involves resampling of 



the data (see, for instance, Ref. @)- By contrast, we 
wish to use systematic procedures for determining error 
or estimator regions from only the data that we did ob- 
serve. 

We are, however, not considering estimator regions 
of any kind, but specifically maximum-likelihood regions 
(MLRs). For the given data, the MLR is that region of 
pre-chosen size, for which the data are more likely than 
any other region of the same size. The regions referred 
to here are regions in the space of quantum states (more 
precisely: in the reconstruction space; see Sec. Ill A[) . As 
we shall see, there is an intimate connection between the 
MLE and the MLRs for the same data: All MLRs con- 
tain the MLE, and in the limit of very small size, the 
MLR is a small vicinity of the MLE. 

The "size of a region" is clearly an important notion 
here. We agree with Evans, Guttman, and Swartz Q 
that, in the present context of state estimation, it is nat- 
ural to measure the size of a region by its prior probability 
that the actual state lies in the region, that is: the prob- 
ability that we assign to the region before any data are 
at hand. As they should, regions with the same size have 
the same prior probability; and the whole state space has 
unit size = unit prior probability because the actual state 
is surely somewhere in the state space. 

In addition to MLRs, we also consider smallest credible 
regions (SCRs). The credibility of a region is its poste- 
rior probability, that is: the probability that the actual 
state lies in the region, conditioned on the data (see, for 
example, Ref. @). The SCR, then, is the smallest region 
with the pre-chosen value of the credibility. 

It turns out that the problems of finding the MLR 
and the SCR are duals of each other. Each SCR is also a 
MLR, and each MLR is a SCR. In both cases, the optimal 
regions contain all states for which the likelihood of the 
data exceeds a threshold value. In particular, in the limit 
of small credibility, the SCR is a small vicinity of the 
MLE. 

The confidence regions that were recently studied in 
the quantum context by Christandl and Renner [lfl ] , and 



2 



by Blumc-Kohout ll|, are markedly different from the 
SCRs and the MLRs. Confidence regions give an answer 
to the following question: Consider all conceivable data, 
all sequences of detector clicks that could possibly be 
obtained, and assign a region to each sequence; how do 
we choose the regions such that a pre-chosen fraction of 
the regions (the confidence level) will surely contain the 
unknown actual state? We contrast this with the corre- 
sponding question for the SCR: Consider all permissible 
states, each a candidate for the unknown actual state; 
what is the smallest region, for the observed data, that 
contains the actual state with a pre-chosen probability? 

The difference between the two questions is simple, 
yet profound. When asking for confidence regions, the 
data are regarded as the random variable; whereas the 
observed data are given for the SCR, and the unknown 
state is the random quantity. A further difference to note 
is that the sizes of the confidence regions play a minor 
role in their construction, whereas its size is a crucial 
property of a SCR. 

Here is a brief outline of the paper. We set the stage 
in Sec. [TT] where we introduce the reconstruction space, 
discuss the size of a region, and define the various joint 
and conditioned probabilities. Equipped with these tools, 
we then formulate in Sec. IHII the optimization problems 
that identify the MLRs and SCRs and find their solu- 
tions; this is followed by remarks on confidence regions. 
Criteria for choosing unprejudiced priors are the subject 
of Sec. IIV1 and simulated qubit measurements illustrate 
the matter in Sec. |Vj We close with an outlook on the 
problems that need to be solved before MLRs and SCRs 
can be computed efficiently for data acquired in actual 
experiments. 



II. SETTING THE STAGE 



A. Reconstruction space 

The K outcomes IIi, n 2 , . . . , J1 K of the POM, with 
which the data are acquired, are positive Hilbert-space 
operators that decompose the identity, 

K 

^]n fe = l with n fe > for k = 1,2,..., if. (1) 

fc=i 

If the state p describes the system, then the probability 
Pk that the fcth detector will click for the next copy to 
be measured is 



The positivity of p and its normalization ensure the pos- 
itivity of the pkS and their normalization 



Pk = tr{n fc p} = (life) , 



(2) 



which is the Born rule, of course. Here, p can be any 
positive operator with unit trace, 



Pk > 0, 



K 

E 

fe=i 



Pk 



(4) 



P > , tr{p} = 1 . 



(3) 



Probabilities p = (jp\,P2, ■ ■ ■ ,Pk ) for which there is a 
state p such that Eq. ^ holds, are permissible proba- 
bilities. They make up the probability space. 

The probability space for a if-outcome POM is usually 
smaller than that of a if -sided die because not all positive 
PkS with unit sum are permitted by the Born rule. The 
quantum nature of the state estimation problem enters 
only in these additional restrictions on p: Quantum state 
estimation is standard statistical state estimation with 
quantum constraints. The rich concepts and methods of 
statistical inference apply immediately to the quantum 
situation, modified where necessary to account for the 
restricted probability space. 

Whereas the pkS arc uniquely determined by p in ac- 
cordance with Eq. ©, the converse is only true if the 
POM is informationally complete. In any case, there is 
always a reconstruction space TZq, a set of ps that con- 
tains exactly one p for each set of permissible probabili- 
ties, consistent with the Born rule. If there is more than 
one reconstruction space, it does not matter which one 
we choose. While the probability space is always convex, 
a convex reconstruction space may not be available. 

The reconstruction space is at most (K — ^-dimen- 
sional, and has a smaller dimension if fewer probabilities 
are independent. We note that K is always finite, and so 
is the dimension of the reconstruction space. There are 
no real-life POMs with an infinite number of outcomes. 

As an example, consider a harmonic oscillator with its 
infinite-dimensional state space. If the POM has two 
outcomes with p\ equal to the probability of finding the 
oscillator in its ground state, and P2 = 1 — Pi , one re- 
construction space is the set of convex combinations of 
the projector to the ground state and another state with 
no ground-state component. In this situation, there is 
a large variety of reconstruction spaces to choose from, 
because any other state serves the purpose, and all one 
can infer from the data is an estimate of the ground-state 
probability. 

Now, state estimation is the task of finding a state, or 
a region of states, in the reconstruction space by a sys- 
tematic and reliable procedure that exploits the observed 
data. In view of the one-to-one correspondence between 
the states in the reconstruction space and the permissible 
probabilities, one can identify the reconstruction space 
with the probability space. Indeed, since the probability 
space is unique, while there can be many different recon- 
struction spaces, it is often more convenient to work in 
the probability space. The primary objective is then to 
find an estimator, or a region of estimators, for the proba- 
bilities p. The conversion of the set of probabilities p into 
a state p is performed later, if at all, and only at this stage 
do we need to decide which reconstruction space is used 
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for reference. If the POM is not informationally com- 
plete, it will be necessary to invoke additional criteria or 
principles for a unique mappin g; p — > p . For example, one 
could follow Jaynes's guidance [1 2L 1 1 31] and maximize the 
entropy (see also Ref. (l5j). 



B. Size and prior content of a region 

Prior to acquiring any data, we assign equal probabili- 
ties to equivalent alternatives. If we split the reconstruc- 
tion space in two, it is equally likely that the actual state 
is in either half and, therefore, each half should carry 
a prior probability of ^, provided that the splitting- in- 
two is fair, that is: the two pieces are of equal size. A 
preconceived notion of size is taken for granted here. Fur- 
ther fair splitting, into more disjoint regions of equal size, 
then suggests rather strongly that the prior probability of 
a region should be proportional to its size. We take this 
suggestion seriously: Scale all region sizes such that the 
whole reconstruction space has unit size, and then the 
size of a region is its prior probability — its "prior con- 
tent" if we borrow terminology from Bayesian statistics. 

The identification "size = prior probability" is techni- 
cally possible because both quantities simply add if dis- 
joint regions are combined into a single region. There 
is no room for mathematical inconsistencies here, unless 
we begin with a region-to-size mapping for which the re- 
construction space cannot be normalized to unit size, so 
that we would obtain improper prior probabilities. We 
are not interested in pathological cases of this or other 
kinds and just exclude them. Should an improper prior 
be useful in a particular context, it should come about 
as the limit of a well-defined sequence of proper priors. 

The above line of reasoning can be reversed. Should 
we have established each region's prior probability with 
other means (perhaps invoking symmetry arguments or 
taking into account that the source under investigation 
is designed to emit the information carriers in a certain 
target state; see Sec. IIV[) . then we accept this as the 
natural measure of the region's size @. After all, the 
reconstruction space is an abstract construct that is not 
endowed with a self-suggesting unique metric, and a re- 
gion's prior probability is the quantity that matters most 
in the present context of statistical inference. 

We denote by (dp) the size of the infinitesimal vicinity 
of state p. The size S-ji of a region 7Z C TZq is then 
obtained by integrating over the region, 



&R = J (dp) with / (dp) 

K K 



1 



(5) 



where the latter integration covers all of the reconstruc- 
tion space. By construction, the value of S-r does not 
depend on the parameterization that we use for the nu- 
merical representation of (dp). The primary parameteri- 
zation is in terms of the probabilities, 



where the prior density w(p) is nonzero for all permissible 
probabilities and vanishes for all non-permissible ps. In 
particular, w(p) always contains 

w o(p) = v(Pi)v(P2) ■ ■ ■ v(pk)5{pi+P2^ hpit-1) (7) 

as a factor and so enforces the constraints that the prob- 
abilities are positive and have unit sum fl6| . If there are 
no other constraints, we have the probability space of a 
if-sided die. For genuine quantum measurements, how- 
ever, there are additional constraints, some accounted for 
by more delta-function factors, others by step functions. 
The delta-function constraints reduce the dimension of 
the reconstruction space from K — 1 to the number of 
independent probabilities. 

For the harmonic-oscillator example of Sec. Ill Al which 
has the same probability space as a tossed coin, the factor 
u>o(p) selects the line segment with < pi = 1 — P2 < 1 
in the P1P2 plane. If we choose the "primitive prior" 
(dp) = (dp) Wo(p), the subsegment with a < p\ < b has 
size b — a. For the Jeffreys prior [l7| . a popular choice of 
an unprejudiced prior [181 ] . 



(dp) = (dp)w (p) 



1 



(8) 



(dp) = (dp) w(j>) with (dp) = dpi dp2 ■ ■ ■ dp 



K 



(6) 



the same subsegment has size -]sin _1 (v / ^) — sin _1 (-y/a)]. 

In this example, and also in those we use for illustra- 
tion in Sec.fVlbelow. it is easy to state quite explicitly the 
restrictions on the set of permissible probabilities that 
follow from the Born rule. In other situations, it could 
be difficult or impossible. This is why state estimation 
is often done by searching for a statistical operator in a 
suitable state space. For practical reasons, it may be nec- 
essary to truncate the full state space — which can be, and 
often is, infinite-dimensional — to a test space of manage- 
able size. With such a truncation one accepts that not 
all permissible probabilities are investigated. Therefore, 
a criterion for judging if the test space is large enough is 
to verify that the estimated probabilities do not change 
significantly when the space is enlarged. Examples for 
the artifacts that result from test spaces that are too 
small can be found in Ref. UM ■ 



C. Point likelihood, region likelihood, credibility 

The data D acquired by the POM consist of a sequence 
of detector clicks, with a total of n& clicks of the kth de- 
tector, and a total number of N = n\-\-ri2+- ■ •+tik clicks 
after measuring TV quantum-information carriers [20| . 
The probability of obtaining the data, if p is the state, 
is the familiar point likelihood 

L(D\p)=p? P r---pT • (9) 
It attains its maximal value when p is the MLE p ML , 

ma,xL{D\p) =L(D\p ML ), (10) 
p 
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where p ML is in the reconstruction space, but the maxi- 
mum could be taken over all states. 

The joint probability of finding the state p in the region 
TZ and obtaining the data D is then 



prob(L> ATZ) = / (dp) L{D\p) 



(11) 



7v 



If 72. = TZq, we have the prior likelihood L(D), 



prob(L> ATZ ) = L{D) = j (dp) L{D\p) . (12) 



Since one of the click sequences is surely observed, the 
likelihoods of Eqs. (|9|) and (|T2|) have unit sum, 



D 



L(D\p) 



E 



E 

D 



= {Pi +P2 + 

L(D) = J (dp) = 1 



N\5 

nil ni\ ■ ■ ■ tik 

N _ 



N,n 1 +n 2 + — +n K rn n lrn n 2 n K 
j Pi P2 " ' PK 



PK) 



(13) 



We factor the joint probability prob(D A TZ) in two 
different ways, 



prob(D A TZ) = L{D\H)S n = C n (D)L(D) 



(14) 



and so identify the region likelihood L(D\TZ) and the cred- 
ibility Cfc(D). Both quantities are conditional probabili- 
ties: The region likelihood is the probability of obtaining 
the data D if the state is in the region TZ; the credibility 
is the probability that the actual state is in the region TZ 
if the data D were obtained — the posterior probability of 
TZ. 



III. OPTIMAL ERROR REGIONS 
A. Maximum-likelihood regions 

Instead of looking for the MLE, the single point in the 
reconstruction space that has the largest likelihood for 
the given data D, we desire a region with the largest 
likelihood — the MLR. For this purpose, we maximize the 
region likelihood L(D\1Z) under the constraint that only 
regions with a pre-chosen size s participate in the com- 
petition, with < s < 1 ; an unconstrained maximization 
of L(D\TZ) is not meaningful because it gives the limiting 
region that consists of nothing but the point p UL . The re- 
sulting MLR TZ ML is a function of the data D and the size 
s, but we wish to not overload the notation and will keep 
these dependences implicit, just like the notation does 
not explictly indicate the D dependence of the MLE p ML . 

The MLR analog of the MLE definition in Eq. dTUJ) is 
then 



max L(D\1Z) 

TlC-Ra 



L(D\1Z ML ) with S n 



(15) 




dA(p) 



FIG. 1: Infinitesimal variation of region TZ. The boundary of 
region TZ (solid line) is deformed to become the boundary of 
region TZ + 5TZ (dashed line) . 



Since all competing regions have the same size, we can 
cquivalently maximize the joint probability, 



max prob(Z) A TZ) 

RCKo 



prob(D A TZ ML ) with S-jz = s . 



(16) 

The answer to this maximization problem is given in 
Corollary 4 of Ref. and justified by a detailed proof 
of considerable mathematical sophistication. We proceed 
to offer an alternative argument that is perhaps more ac- 
cessible to the working physicist. 

Owing to the maximum property of the MLR and its 
fixed size, both prob(-D A TZ) and S-r must be stationary 
under infinitesimal variations STZ of the region TZ. Such 
an infinitesimal variation is achieved by deforming the 
boundary dTZ of the region, as illustrated in Fig. [T] The 
resulting change in the size S-jz vanishes for all permissi- 
ble deformations, 



SS n = / dA(p)-Se(p) =0. 



(17) 



Here, dA(p) is the vectorial surface element of the bound- 



ary dTZ at point p in the reconstruction space, and 6e(p) 
is the infinitesimal displacement of the point p that de- 
forms TZ into TZ + STZ. 

The corresponding change in prob(Z? A TZ) is 



(5prob( J D A TZ) 



dA(p) 



6e(p)L(D\p) 



0, (18) 



a-R 



which^attains the indicated value of at the extremum 
TZ = TZ ML . If we have the situation sketched in the top- 
left plot of Fig. [5J where TZ Mh is completely in the inte- 
rior of the reconstruction space, both Eqs. (JTTJ) and (|T5)) 
must hold simultaneously for arbitrary infinitesimal de- 
formation STZ. This is possible only if the point likelihood 
L(D\p) is constant on the boundary dTZ Mh of TZ ML , that is: 
dTZ Mh is an iso-likelihood surface (ILS). Furthermore, TZ ML 
must correspond to the interior of this ILS (as opposed 
to its complement in the reconstruction space), since the 



5 




FIG. 2: MLRs of two kinds. In the top-left sketch, TZ ML is 
completely contained inside the reconstruction space; in the 
bottom-right sketch, the boundary &JZ M l of 1Z M l contains a 
part of the surface dlZo of the reconstruction space. Dot- 
ted lines indicate iso-likelihood surfaces, that is: surfaces on 
which the point likelihood is constant. 



concavity of the logarithm of the point likelihood implies 
that the interior necessarily has larger likelihood values 
than its complement (2lj . 

If the boundary dlZ ML of 1Z ML contains a part of the 
surface dlZ of the reconstruction space, which is the sit- 
uation on the bottom-right in Fig. [2j all interior points 
on dlZ UL must still lie on an ILS, or else we can always 
deform dlZ ML to attain a larger value of the region like- 
lihood with a permissible choice of Se(p). On the dlZo 
part of dlZ Mh , the point likelihood L(D\p) has larger val- 
ues than the constant value on the interior part of the 
boundary, because ILSs that are inside 1Z ML (dashed in 
Fig. [5]) and have endpoints in dlZo assign their larger 
likelihood values to these points. Therefore, deforming 
the dlZo part of dlZ ML inwards, with the change in size 
compensated for by an outwards deformation of the in- 
terior part of dlZ Mhl decreases the value of the region 
likelihood. And since outwards deformations of dlZo are 
not possible, a region with an ILS as interior part of the 
boundary, supplemented by a part of dlZo ■ is a possible 
MLR, indeed. 

In summary, the MLRs of various sizes s consist of 
all states p for which the point likelihood L(D\p) ex- 
ceeds a certain threshold value, with higher thresholds 
for smaller sizes. Quite remarkably and somewhat sur- 
prisingly, the set of MLRs does not depend on the chosen 
prior. The shape of a MLR is fully determined by the 
point likelihood and the threshold value; the prior enters 



only when the size, region likelihood, and credibility of 
the MLR are calculated. 

It is expedient to specify the threshold value as a frac- 
tion of the maximum value L(D\p UL ) of the point like- 
lihood. Denoting this fraction by A, the characteristic 
function of the corresponding bounded-likelihood region 
(BLR) 1Z\ is the step function 

Vn x (.P) = v{L(D\p) - XL(D\p UL )) , (19) 

where 

, , / 1 if p is in 1Z . . 

^) = \0el S e ( 2 °) 

is the characteristic function of region 1Z. BLRs have 
appeared previously in standard statistical analysis; see 
Ref. and references therein. 
The BLR 1Z\ has the size 

SA = J (dp) VKx (p), (21) 

and we have 1Z\ = IZo and s\ = so = 1 for A < Ao with 
Ao > given by 

min L(D\p) = \ L{D\p ML ). (22) 

p 

As A increases from Ao to 1, s\ decreases monotonically 
from 1 to 0. The size s specified in Eq. (Tl"5|) is obtained 
for an intermediate A value, and the corresponding BLR 
is the looked-for MLR. 

The MLE is contained in all MLRs. In the s ->• limit, 
the MLR becomes an infinitesimal vicinity of the MLE 
and the region likelihood of the limit region is equal to 
the point likelihood of the MLE, L(D\K ML ) L(D\p ML ). 

B. Smallest credible regions 

The MLR is the region for which the observed data are 
particularly likely. With a reversal of emphasis, we now 
look for a region that contains the actual state with high 
probability. Ultimately, this is the SCR 1Z SC : the smallest 
region for which the credibility has the pre-chosen value c. 

For the given D, the optimization problem 

min S n = Sa with C n (D) = c (23) 

is dual to that of Eqs. ([15]) and (fIT)|) . Here we minimize 
the size for given joint probability, there we maximize 
the joint probability for given size. It follows that the 
BLRs of Eq. (fTO)) are not only the MLRs, they are also 
the SCRs: Each MLR is a SCR, each SCR is a MLR. 
The BLR 1Z\ has the credibility 

cx = j^- ) J(dp)v^(p)L(D\p), (24) 

which, just like s\, decreases monotonically from 1 to 
as A increases from Ao to 1. The credibility c specified in 
Eq. (|23|) is obtained for an intermediate value, and the 
corresponding BLR is the looked-for SCR. 
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C. Size and credibility of a BLR 

The responses of the size s\ and the credibility c\ of a 
BLR to an infinitesimal change of A are linked by 



L(D)—c x = L(D\p Uh )X—s x . 



(25) 



Therefore, once s\ is known as a function of A, we obtain 
c\ by an integration, 



x sx+J 


f dA' s x > 


J 


f dA' sv 





(26) 



This is, of course, consistent with the limiting values for 
A < Ao and A = 1, and also establishes that, for all inter- 
mediate values, the credibility of a BLR is larger than its 
size, 



c\ > s\ for Aq < A < 1 



(27) 



Further, Eqs. (J2SJ) and ([2>]) tell us that in the A ->• 1 
limit, when both s\ and c\ vanish, their ratio is finite 
and exceeds unity, 



f 



TdA' 
Jo 



UP\pm) 
L(D) 



> 1 for A ->• 1 



(28) 



We note that this provides the value of L(D), since the 
maximal value L(D\p UL ) of the point likelihood is com- 
puted earlier as it is needed for identifying the BLRs. 

Inasmuch as the value of s\ quantifies our prior belief 
that the actual state is in TZ\, we are surprised when 
the data tell us that the probability for finding the state 
in that region is larger. Accordingly, the SCR is the 
region for which we are most surprised for the given prior 
belief (2Sj. This matter and other aspects of Bayesian 
inference based on the concept of relative surprise are 
discussed in Ref. 

The relation (f2l)]) is also of considerable practical im- 
portance because we only need to evaluate the multi- 
dimensional integrals of Eq. (|2Tj) . but not those of 
Eqs. (f2~l| and (fT2")l . Since the latter integrals require 
well-tailored Monte-Carlo methods to handle the typi- 
cally sharply peaked likelihood function, the numerical 
effort is very substantially reduced if we only need to 
evaluate the integral of Eq. (|2"T1) . 

Indeed, the estimator regions for the observed data are 
conveniently and concisely communicated by reporting 
S\ and c\ as functions of A. The end users interested in 
the MLR with the size of his liking or the SCR of her 
wanted credibility can thus determine the corresponding 
values of A. It is then an easy matter to check if any 
particular p is inside the specified region or not. 

Once more, we use the simple harmonic-oscillator ex- 
ample of Sec. Ill Al for illustration. Suppose, N = 2 
copies have been measured, and we obtained one click 



each for the two outcomes, so that the point likelihood 
is equal to piP2- In this situation, we have Ao = and 
= vi^PiPz - A), so that \pi — p 2 1 < — X for the 
BLR 1Z\. This gives 



s/T 
1 



A. 



-(2 + x)VT^x 



(29) 



for the primitive prior, and 
2 



s\ = 1 sin 1 (v / A) , 

--sin" 1 (\/A) + -v/A(l-A) (30) 



7T 



7T 



for the Jeffreys prior. 



D. Confidence regions 

The confidence regions that were recently studied 
by Christandl and Renner [Io[, and independently by 
Blume-Kohout [ll|, are markedly different from the 
MLRs and the SCRs. The MLR and the SCR represent 
inferences drawn about the unknown state p from the 
data D that have actually been observed. By contrast, 
confidence regions are a set of regions, one region for each 
data, whether observed or not, from the measurement of 
N copies. The confidence regions would contain any state 
in, at least, a certain fraction of many iV-copy measure- 
ments, if the many measurements were performed. This 
fraction is the confidence level. 

When denoting by Cd the confidence region for data 
D, the confidence level 7 of the set C of CdS for all con- 
ceivable data (for fixed N) is 



7 (C) = min^Lp|p) % ». 

9 D 



(31) 



where the minimum is reached in the "worst case." For 
example, in the security analysis of a protocol for quan- 
tum key distribution, one wishes a large value of 7 to 
protect against an adversary who controls the source and 
prepares the quantum-information carriers in the state 
that is best for her. 

Any set C, for which 7 has the desired value, serves 
the purpose. A smaller set C, in the sense that C' D is 
contained in Cd for all D, is preferable, but usually there 
is no smallest set of confidence regions. Here, "smaller" 
is solely in this inclusion sense, with no reference to a 
quantification of the size of a region and, therefore, there 
is no necessity of specifying the prior probability of any 
region. Since the transition from set C to the smaller 
set C requires the shrinking of some of the CdS without 
enlarging even a single one, it is easily possible to have 
two sets of confidence regions with the same confidence 
level and neither set smaller than the other. 

For illustration, we consider the harmonic-oscillator 
example of Sec. Ill Al yet another time. Figure [3] shows 
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(a) 1 ■ ^3 

(b) 1 — 

1 P2 
I 1 1 

Pi 1 

(c) 1 I 

(d) 1 1 

FIG. 3: Confidence regions and smallest credible regions. 
The bars indicate intervals of pi = 1 — p2 for the harmonic- 
oscillator example of Sec. Ill Al which has the reconstruction 
space of a tossed coin. Two copies are measured. The left 
solid bars indicate the regions for (ni, n?) = (0, 2) counts; the 
right solid bars are for (ni,ri2) = (2,0); and the central open 
bars are for (721,712) = (1, 1). Cases (a) and (b) show two sets 
of confidence regions for confidence level 7 = 0.8. Regions (c) 
and (d) are the SCRs for the primitive prior and the Jeffreys 
prior, respectively, both for credibility c = 0.8. 

two sets of confidence regions (7 = 0.8) and the corre- 
sponding three SCRs (c = 0.8) for the primitive prior and 
the Jeffreys prior. Both sets of confidence regions are op- 
timal in the sense that one cannot shrink even one of 
the regions without decreasing the confidence level, but 
neither set is smaller than the other. In the absence of 
additional criteria that specify a preference, both work 
equally well as sets of confidence regions. 

We observe in this example that confidence regions 
tend to overlap a lot, which is indeed unavoidable if a 
large confidence level is desired. By contrast, the SCRs 
for different data usually do not overlap unless the data 
are quite similar. In Fig. [3J there is no overlap of the 
SCRs for (ni,n 2 ) = (0,2) and (2,0). 

An important difference of considerable concern in all 
practical applications is the following. Once the data are 
obtained, there is the MLR and the SCR for these data, 
and it plays no role what other MLRs or SCRs are asso- 
ciated with different data that have not been observed. 
To find the confidence region for the actual data, how- 
ever, one must first specify the whole set C of confidence 
regions because the confidence level of Eq. (pTTj) is a prop- 
erty of the whole set. Christandl and Renner have 
shown that one can choose hi gh-c redibility regions for the 
Cds [Hj], and Blume-Kohout jlH] has argued that a set C 
composed of BLRs can be a pretty good set of confidence 
regions. 



IV. CHOOSING THE PRIOR 

The assignment of prior probabilities to regions in the 
reconstruction space should be done in an unprejudiced 
manner while taking into account all prior information 
that might be available. We cannot do justice to the rich 
literature on this subject and are content with noting 



that Ref. [18| reviews various approaches to constructing 
unprejudiced priors. Let us discuss some criteria that are 
useful when choosing a prior. 

A general remark is this: The chosen prior should give 
some weight to (almost) all states, and it should not give 
extremely high weight to states in some part of the state 
space and extremely low weight to other states. This is 
to say that the prior should be consistent in the sense 
that the credibility of a region — its posterior content — 
is dominated by the data, rather than by the prior, if a 
reasonably large number TV of copies is measured. 

A. Uniformity 

The time-honored strategy of choosing a uniform prior 
gets us into a circular argument: The line of thought 
presented in Sec. Ill Bl implements this strategy and leads 
to identifying the prior content of a region with its size. 
But that just means that we are now asked to declare 
how we measure the size of a region without prejudice, 
which is the original question about the prior. 

In fact, there is no unique meaning of the uniformity 
of a prior. In the sense that each prior tells us how to 
quantify the size of a region, each prior is uniform with 
respect to its induced size measure. 

This point can be illustrated with the harmonic- 
oscillator example of Sec. Ill Al For the primitive prior 
of Sec. Ill Bl the parameterization 

vi = \(y + x), P2 = ^(y-x), 

dpi dp 2 = dx dy - (32) 

gives 

(dp) = dxdy-r)(y + x)r)(y-x)S(y-l) 

->da;i with -1 < a:< 1, (33) 

where we integrate over y in the last step and so observe 
that the primitive prior is uniform in x, that is: the size of 
the region X\ < x < x-i is proportional to x<z — x\. Like- 
wise, the parameterization 

pi = y(s'ma) 2 , p 2 = y(cos a) 2 , 

dpi dp2 = da dy y sin(2a) (34) 

gives 

2 7T 

(dp) -f da- with < a < - (35) 

7T 2 

for the Jeffreys prior, which is uniform in a. Other pri- 
ors can be treated analogously, each of them yielding a 
uniform prior in an appropriate single parameter. 

The parameterizations in Eqs. (f3"2"]l and (|34[) exhibit in 
which explicit sense the primitive prior and the Jeffreys 
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prior arc uniform. But the priors are what they are, 
irrespective of how they are parameterized. They are 
explicitly uniform in a particular parameterization and 
implicitly uniform in all others. Uniformity, it follows, 
cannot serve as a principle that distinguishes one prior 
from another. 

This ubiquity of uniform priors for a continuous set of 
infinitesimal probabilities is in marked contrast to situa- 
tions in which prior probabilities arc assigned to a finite 
number of discrete possibilities, such as the 38 pockets 
of a double-zero roulette wheel. Uniform probabilities 
of 1/38 suggest themselves, are meaningful, and clearly 
distinguished from other priors, all of which have a bias. 

Uniformity in a particularly natural parameterization 
of the probability space might also be meaningful. This, 
however, invokes a notion of "natural" that others may 
not share. 



B. Utility 

In many applications, estimating the state is not a pur- 
pose in itself, but only an intermediate step on the way 
to determining some particular property of the physical 
system. The objective is to find the value of a parameter 
that quantifies the utility of the state. 

For example, one could be interested in the fidelity of 
the actual state with a target state, or in an entangle- 
ment measure of a two-partite state, or in another quan- 
tity that tells us how useful are the quantum-information 
carriers for their intended task. In a situation of this 
kind, one should, if possible, use a prior that is uniform 
in the utility parameter of interest. 

As a simple example, consider a single qubit. The 
utility parameter is the purity £(p) = tr{p 2 } of the state 
p. With the Bloch-ball representation of a qubit state, 
p = i(l + r ■ cr), where r = tr{erp} = (cr) is the Bloch 
vector and cr is the vector of Pauli matrices, the purity 
is 

= ^(1 + r 2 ) withr=|r|. (36) 

A prior uniform in purity induces a prior on the state 
space according to 



(dp) oc d£ dQ oc rdr dQ, 



(37) 



where we parameterize the Bloch ball by spherical coor- 
dinates (r,6,4>). Here, dfi is the prior for the angular 
coordinates; the prior for the radial coordinate r is fixed 
by our choice of uniformity in £. Irrespective of what we 
choose for df2, the marginal prior for r is uniform in £. 

If one can quantify the utility of an estimator by a 
cost function, an optimal prior can be selected by a min- 
imax strategy: For each prior in the competition one 
determines the maximum of the cost function over the 
states in the reconstruction space, and then chooses the 
prior for which the maximum cost is minimal. In classi- 
cal statistics, such minimax strategies are common (see, 



for instance, Chapter 5 in Ref. [251]): for an example in 
the context of quantum state estimation, see Ref. |26( . 



C. Symmetry 

Symmetry considerations are often helpful in narrow- 
ing the search for the appropriate prior. For a partic- 
ularly instructive example, see Sec. 12.4.4 in Jaynes's 
posthumous book p7| . 

Returning to the uniform-in-purity prior of Eq. (|37[) . 
one can invoke rotational symmetry in favor of the usual 
solid-angle element, dft = sin 6d9 d<f>, as the choice of an- 
gular prior. The reasoning is as follows: The purity of 
a qubit state does not change under unitary transforma- 
tions; unitarily equivalent states have the same purity. 
Now, regions that are turned into each other by a uni- 
tary transformation have identical radial content whereas 
the angular dependences are related by a rotation. In- 
variancc under rotations, in turn, requires that the prior 
is proportional to the solid angle, hence the identifica- 
tion of dil with the differential of the solid angle. Note 
that the resulting prior element (dp) is different from the 
usual Euclidean volume element, r 2 dr sin#d#d(/>, which 
would be natural if the Bloch ball were an object in the 
physical three-dimensional space. But it ain't. 

Symmetry arguments should be used carefully and not 
blindly. For a fairly tossed coin, the prior should not 
be affected if the probabilities for heads and tails are 
interchanged, w{p\,pi) = w(p2,Pi)- However, for the 
harmonic-oscillator example of Sec. Ill Al which has the 
same reconstruction space as the coin, there is poor jus- 
tification for requiring this symmetry because the two 
probabilities — of finding the oscillator in its ground state, 
or not — are not on equal footing. 



D. Invar iance 

When one speaks of an invariant prior, one does not 
mean the invariance under a change of parameterization 
— all priors are invariant in this respect — but rather a 
form-invariant construction in terms of a quantity that, 
preferably, has an invariant significance. We consider 
two particular constructions that make use of the met- 
ric induced by the response of the selected function to 
infinitesimal changes of its variables. 

The first construction begins with a quantity F(p) that 
is a function of all probabilities p = (pi, . . . ,pk)- We 
include the square root of the determinant of the dyadic 
second derivative in the prior density factor, 



(dp) = (dp) 



dot 



/ 8 2 F 
\dp 3 dpk 



1/2 



Wcstr (?) ! ( 38 ) 



where w cstr (p) contains all the delta- function and step- 
function factors of constraint as well as the normalization 
factor that ensures the unit size of the reconstruction 
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space [28(| . The prior defined by Eq. ([38)1 is invariant in 
the sense that a change of parameterization, from p to a, 
say, does not affect its structure, 



(dp) = (da) 



det 



d 2 F 



dotj da k J jk 



1/2 



Wcstr 



(39) 

because the various Jacobian determinants take care of 
each other. 

For the second construction, we use a data-dependent 
function G(p, v) of the probabilities p and the frequencies 
v = (y\, z/2, • ■ • , vk) with Vj — rij/N. Here, the square 
root of the determinant of the expected value of the 
dyadic square of the p-gradient of G is a factor in the 
prior density [28j |. 



(dp) = (dp) 



det 



'dG dG 
dp ,• dp k 



1/2 



w cstl -(p) , (40) 



where f(u) denotes the expected value of f(v), 
W)=Y,L{D\p)f{y). 

D 

We have, in particular, the generating function 



(41) 



ex P hr 

\fc=l 



(42) 



for the expected values of products of the UkS. The prior 
defined by Eq. (|40[) is form-invariant in the same sense, 
and for the same reason, as the prior of Eq. (|38jl . 

Table U reports a few examples of " Vdet " factors con- 
structed by one of these two methods. It is worth noting 
that the Jeffreys prior can be obtained from the entropy 
of the probabilities by the first method as well as from 
the relative entropy between the probabilities and the fre- 
quencies by the second method. The latter is a variant of 
Jcffreys's original derivation [lTj in terms of the Fisher 
information. 



Conjugation 



TABLE I: Form-invariant priors constructed by one of the two 
methods described in the text. The " V det " column gives 
the p-dependent factors only and omits all p-independent 
constants. The first method [Eq. (|38[) ] proceeds from func- 
tions of the probabilities that have extremal values when all 
probabilities are equal or all vanish save one. The second 
method [Eq. (|4(jp ] uses functions that quantify how similar 
are the probabilities and the frequencies. The " hedg ed prior" 
is named in analogy to the "hedged likelihood" [29l |. 



method 



primary function 



'det 



1st 



1st 



2nd 



2nd 



-^Pfc logp fe 

fc 

(Shannon entropy) 

12 pi 

k 

(purity) 

k 

(inner product) 

y^fclog(/y fe /p fc ) 

fc 

(relative entropy) 



VP1P2 • ■ • Vk 
(Jeffreys prior) 



(primitive prior) 

y/piP2 ---Vk 

(hedged prior) 
1 

• ■ ■ VK 
(Jeffreys prior) 



The conjugate prior can be understood as the "mock 
posterior" for the primitive prior that results from pre- 
tending that a copies have been measured in the past 
and data obtained that are most typical for the target 
state. Therefore, a conjugate prior is quite a natural way 
of expressing the expectation that the apparatus is func- 
tioning well. The posterior content of a region will be 
data-dominated only if N is much larger than a. 

In this context, it may be worth noting that the 
Bayesian mean state, 



(dp) p , 



(44) 



computed with the conjugate prior above, is usually not 
the target state unless a is large. One could construct 
priors for which p BM is the target state, but the presence 
of the Wcstr(p) factor requires a case- by-case construction. 



Sometimes there are reasons to expect that the ac- 
tual state is close to a certain target state with proba- 
bilities t = {b\,t2, ■ ■ ■ ,tx)- This is the situation, for ex- 
ample, when a source is designed to emit the quantum- 
information carriers in a particular state. A conjugate 
prior 

(dp) = (dp) (p^pl 2 ■ ■ ■ p^) a w cstl -(p) with a > (43) 

could then be a natural choice [3(|. The (•••)" factor is 
maximal for p — t, and the peak is narrower when a is 
larger. 



F. Marginalization 

All priors used as examples — the ones in Table U and 
Eqs. (|33"1) . (|35|) . (|43|) — have in common that they are de- 
fined in terms of the probabilities and, therefore, they 
refer to the particular POM with which the data are col- 
lected. While this pays due account to the significance 
of the data, it does not seem to square with the point 
of view that prior probabilities are solely a property of 
the physical processes that put the quantum-information 
carriers into the state that is then diagnosed by the POM. 
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When adopting this viewpoint, one begins with a prior 
density defined on the entire state space. In addition to 
the parameters that specify the reconstruction space (es- 
sentially the probabilities p) , this full-space prior will de- 
pend on parameters whose values are not determined by 
the data. There could be very many nuisance parame- 
ters of this kind, as illustrated by the somewhat extreme 
harmonic-oscillator example of Sec. Ill Al Upon integrat- 
ing the full-space prior over the nuisance parameters, one 
obtains a marginal prior on the reconstruction space. As 
a function on the reconstruction space, the marginal prior 
is naturally parameterized in terms of the probabilities 
and so fits into the formalism we are using throughout. 

Harking back to the last paragraph in Sec. Ill Al we 
note that the invoking of "additional criteria or princi- 
ples" is exactly what would be required if one wishes 
to report estimated values of the nuisance parameters. 
That, however, goes beyond making statements that are 
solidly supported by the data and is, therefore, outside 
the scope of this article. 

The symmetric uniform-in-purity prior of Sees. IIV Bl 
and IIV CI provides an example for marginalization if the 
POM only gives information about x = (a x ) and y = (o~ y ) 
but not about z = (<j z ). We express the full-space prior 
in cartesian coordinates, integrate over z, and arrive at 



(dp) = dxdy 



2vr 



dz 



Tj(l — X 2 



z 2 ) 



-\/ x 2 + y 2 + z 2 



dxdy—r)(l — x —y ) cosh 

7T 



\/x 2 + y 2 



(45) 



This marginal prior is a function on the unit disk in the 
xy plane, which is the natural choice of reconstruction 
space here. When one expresses (dp) in polar coordi- 
nates, x + iy — sc 1Lp with s > 0, one sees that (dp) is 
uniform in ip and in s 2 cosh - (1/s) — \/I — s 2 , which in- 
creases monotonically from — 1 to on the way from the 
center of the disk at s = to the unit circle where s = 1. 
Plot (a) in Fig. H] illustrates the matter. 



V. EXAMPLES 

For illustration, we consider the simplest situation that 
exhibits the typical features: The quantum-information 
carriers have a qubit degree of freedom, which is mea- 
sured by one of two standard POMs that are not infor- 
mationally complete. 



A. POMs and priors 

For both POMs, the unit disk in the xy plane suggests 
itself for the reconstruction space TZq. The first POM 
combines projective measurements of o~ x and a y into a 



(a) x^Z- 












(dl)^^ 













(b) 




FIG. 4: Uniform tilings of the unit disk for four different 
priors. The disk is in the xy plane, with the x axis horizontal, 
the y axis vertical, and the disk center at x = y — 0. Tiling 
(a) is for the marginal prior of Eq. (|45|) ; tiling (b) depicts the 
primitive prior of Eq. (|51[) ; tilings (cl) and (c2) illustrate the 
Jeffreys prior of Eq. (|52|) with the blue dots (•) just outside 
the unit circle indicating the four directions onto which the 
POM outcomes project; and tilings (dl)and (d2) are for the 
Jeffreys prior of Eq. (|53[) . the blue dots marking the three 
directions of the trine projectors. In each tiling, we identify 
96 regions of equal size by dividing the disk into eight "tree 
rings" of equal size and twelve "pie slices" of equal size. In 
the tilings (a), (b), (cl), and (dl), the boundaries of the pie 
slices are (red) rays and an arc of the unit circle; in the tilings 
(a), (b), (c2), and (d2), the tree rings have concentric circles 
as their boundaries. 



four-outcome POM (K — 4) with probabilities 

P1 H(1±*), P3 H(l±y)- (46) 

P2 J 4 Pi J 4 

The permissible probabilities are identified by 

w cst r (p) =V (p) +P2 - 3 ) S {p 3 + Pi - \ ) r/(3 - 8p 2 ) , (47) 
where 

K K 



V(p) = II viPk) and p 2 = 



(48) 



fe=i 



k=l 
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The dotted equal sign in Eq. (|47j) stands for "equal up 
to a multiplicative constant," namely the factor that en- 
sures the unit size of the reconstruction space. 

The second POM is the three-outcome trine measure- 
ment (K = 3), whose outcomes are sub-normalized pro- 
jectors on the eigenstates of a x and (— a x ± \[Z <J y )/2 
with eigenvalue +1. It has the probabilities 



1 



Pi 



(1 + x), 



P2 
P3 



1 



(2-x±V3y), (49) 



for which 

w cstr (p)=v(p) S(pi + Pi +P3 ~ 1) »7(1 - 2p 2 ) (50) 

summarizes the constraints that the permissible values 
of pi, p 2 , p 3 obey. 

Both POMs have the same primitive prior, 

(dp) = dx dy -77(1 - x 2 - y 2 ) = ds 2 ^ , (51) 

where < s < 1 and (p covers any convenient range of 2tt. 
This prior is uniform in x and y, and in s 2 and ip. The 
polar-coordinate version is the more natural parameteri- 
zation of the unit disk; it is used for plot (b) in Fi g. H) 
The Jeffreys prior for the four-outcome POM is [3l| 



(dp)= 



ds s dtp 



\s A sin(2<p) 2 



(52) 



Plots (cl) and (c2) in Fig. |4] show uniform tilings of the 
unit disk for this prior. For the three-outcome POM, we 
have the Jeffreys prior 31 1 



(dp)= 



ds s dtp 



1 - I s ' 2 + I s3 C0S ( 3 V>) 



(53) 



and the tilings of plots (dl) and (d2) in Fig. |U The 
cross-hairs symmetry of the four-outcome POM and the 
trine symmetry of the three-outcome POM are manifest 
in their respective uniform tilings. 



B. Simulated measurements 

Figures [SJa) and EJ^b) show SCRs obtained for simu- 
lated experiments in which N = 24 copies of a qubit state 
are measured. The actual state used for the simulation 
has x = 0.6 and y = 0.2. Its position in the reconstruc- 
tion space is indicated by the red star (*). 

In Fig. [5ja), we see the SCRs for the four-out- 
come POM. Two measurements were simulated, with 
(ni, n 2 , n 3 , 714) = (8, 5, 10, 1) and (6, 3, 10, 5) clicks of the 
detectors, respectively, and the triangles (a) show the 
positions of the corresponding MLEs. For each data, 
the plot reports the SCRs with credibility c = 0.5 and 
c = 0.9, both for the primitive prior of Eq. (|51l) and for 





FIG. 5: Smallest credible regions for simulated experiments. 
Twenty-four copies are measured by the POMs of Sec. IV Al 
which have the unit disk of Fig. U as the reconstruction space. 
Plot (a) is for the four-outcome POM with the cross hairs 
indicating the orientations of the two projective measure- 
ments. Plot (b) is for the three-outcome measurement with 
the orientation of the trine indicated. The red star (*) at 
(a;, y) = (0.6, 0.2) marks the actual state that was used for 
the simulation. For each POM, there are SCRs for the data 
of two simulated experiments, with black triangles (a) in- 
dicating the respective MLEs. The boundaries of the SCRs 
with credibility c = 0.9 are traced by the continuous lines; all 
of these SCRs contain the actual state. The dashed lines are 
the boundaries of the SCRs with credibility c = 0.5; the ac- 
tual state is inside half of these SCRs. Red lines are for the 
primitive prior of Eq. (|5ip . the blue lines are for the Jeffreys 
priors of Eqs. (|52|) and (|53|) . respectively. — The insets in 
the lower left corners show the size s\ and the credibility c\ 
for the BLRs of two simulated experiments. Inset (a) is for 
(6, 3, 10, 5) counts for the four-outcome POM and the Jeffreys 
prior; inset (b) is for (13,7,4) counts for the three-outcome 
POM and the primitive prior. The dots show the values com- 
puted with a Monte Carlo algorithm. There is much more 
scatter in the c\ values than the s\ values. The red lines are 
fits to the s\ values, with the fits using twice as many values 
than there are dots in the insets. The green lines that ap- 
proximate the ca values are obtained from the red lines with 
the aid of Eq. (25). 
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the Jeffreys prior of Eq. ([52]) . The actual state is in- 
side two of the four SCRs with credibility c = 0.5 and is 
contained in all four SCRs with credibility c = 0.9. 

Not unexpectedly, we get quite different regions for 
the two rather different sets of detector click counts. 
Yet, we observe that the choice of prior has little ef- 
fect on the SCRs, although the total number of mea- 
sured copies is too small for relying on the consistency 
of the priors. The same remarks apply to the SCRs for 
the three-outcome POM in Fig. [5]Jb); here we counted 
(ni 7 n 27 n 3 ) = (15, 8, 1) and (13, 7, 4) detector clicks in the 
simulated experiments. 

In Sec. IIII CI we remarked that the estimator regions 
are properly communicated by reporting s\ and c\ as 
functions of A. This is accomplished by the insets in 
Fig. [5] for two of the four simulated experiments. The 
dots give the values obtained by numerical integration 
that uses a Monte Carlo algorithm. The scatter of these 
numerical values confirms the expected: The computa- 
tion of s\ only requires sampling the probability space in 
accordance with the prior and determining the fraction 
of the sample that is in IZy, for the computation of c\ we 
need to add the values of L{D\p) for the sample points 
inside IZy, and since L{D\p) is a sharply peaked function 
of the probabilities, the s\ values are more trustworthy 
than the c\ values for the same computational effort. 
The line fitted to the s\ values is a Pade approximant 
(see, for example, section 5.12 in Ref. (32[) that takes the 
analytic forms near A = Ao = and A = 1 into account. 
The line approximating the c\ values is then computed 
in accordance with Eq. ([231) . 

VI. OUTLOOK 

For the given data and chosen credibility, the SCR is 
a neighborhood of the MLE. In this sense, then, one can 
regard the SCR as identifying error bars on the param- 
eter values of the MLE in a systematic way. Thereby, 
the MLE is often a state whose probabilities equal the 
observed frequencies, and if there is no such state in the 
reconstruction space, efficient methods are at hand for 
computing the MLE. We are, however, currently lacking 
equally efficient algorithms for finding the SCR. 

Progress on this front is needed before one can apply 
the concepts of MLRs and SCRs to situations in which 



the reconstruction space is of high dimension. Upon 
recalling that informationally complete POMs for two- 
qubit systems already have a 15-dimensional reconstruc- 
tion space, the need for powerful numerical schemes is 
utterly plain. 

In many applications, one is interested in a few param- 
eters only, perhaps a single one, such as the concurrence 
of a two-qubit state or its fidelity with a target state. 
It may then be possible to reduce the dimensionality of 
the problem by marginalizing the nuisance parameters, 
preferably proceeding from a utility-based prior. 

Even after such a reduction, there remains the chal- 
lenge of evaluating the multi-dimensional integrals that 
tell us the size of the BLRs, and then their credibility, 
so that we can identify the looked-for MLR and SCR. 
For this purpose one needs good sampling strategies [33| . 
It is suggestive to rely on the data themselves for guid- 
ance. The full sequence of detector clicks identifies the 
MLE of the data, and subsequences — chosen randomly 
or systematically — have their own MLEs. These boot- 
strapped MLEs are expected to accumulate in the vicin- 
ity of the full-data MLE and may so provide a useful 
sampling method. We have just begun to enter this un- 
explored territory and will report progress in due course. 

We close with a general observation. MLEs, MLRs, 
SCRs, and confidence regions are concepts of statistics, 
even if the terminology is not universal. As we have seen, 
the quantum aspect of the state estimation problem en- 
ters only through the Born rule which restricts the prob- 
abilities to those obtainable from a POM and a bona fide 
statistical operator. Except for these restrictions, there is 
no difference between state estimation in quantum me- 
chanics and standard statistics. Accordingly, quantum 
mechanicians can benefit much from the methods devel- 
oped by statisticians. 
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